Deep-learning model for evaluating histopathology of acute renal tubular injury

Tubular injury is the most common cause of acute kidney injury. Histopathological diagnosis may help distinguish between the different types of acute kidney injury and aid in treatment. To date, a limited number of study has used deep-learning models to assist in the histopathological diagnosis of acute kidney injury. This study aimed to perform histopathological segmentation to identify the four structures of acute renal tubular injury using deep-learning models. A segmentation model was used to classify tubule-specific injuries following cisplatin treatment. A total of 45 whole-slide images with 400 generated patches were used in the segmentation model, and 27,478 annotations were created for four classes: glomerulus, healthy tubules, necrotic tubules, and tubules with casts. A segmentation model was developed using the DeepLabV3 architecture with a MobileNetv3-Large backbone to accurately identify the four histopathological structures associated with acute renal tubular injury in PAS-stained mouse samples. In the segmentation model for four structures, the highest Intersection over Union and the Dice coefficient were obtained for the segmentation of the “glomerulus” class, followed by “necrotic tubules,” “healthy tubules,” and “tubules with cast” classes. The overall performance of the segmentation algorithm for all classes in the test set included an Intersection over Union of 0.7968 and a Dice coefficient of 0.8772. The Dice scores for the glomerulus, healthy tubules, necrotic tubules, and tubules with cast are 91.78 ± 11.09, 87.37 ± 4.02, 88.08 ± 6.83, and 83.64 ± 20.39%, respectively. The utilization of deep learning in a predictive model has demonstrated promising performance in accurately identifying the degree of injured renal tubules. These results may provide new opportunities for the application of the proposed methods to evaluate renal pathology more effectively.


Datasets
Forty-five whole-slice images (WSIs) with 400 generated patches were used for the segmentation model devolopment.Ground-truth annotations were created using the SUPERVISELY polygon tool (supervisely.com).Polygons mark segment annotations by placing waypoints along the boundaries of the objects that the model must segment.All annotations were reviewed by three nephrologists with extensive experience in nephropathology.The pathologists engaged in discussions to resolve disagreements.Four predefined classes were annotated: (1) glomerulus, (2) healthy tubules, (3) necrotic tubules, and (4) tubules with casts.Figure 1A, B and C show examples of the whole-slide images of H&E and PAS-stained kidney section obtained using a slide scanner and a randomly generated patch without annotations, respectively.The annotations consisting of four different structures, 'glomerulus, ' 'healthy tubules, ' 'necrotic tubules, ' and 'tubules with cast' are shown in Fig. 1D.In total, 27,478 annotations, along with their corresponding patches, were partitioned into two distinct proportions: a training subset comprising 80% of the data and a testing subset constituting the remaining 20%.Patches that belonged to the same WSI did not appear in either the training or testing proportions to ensure robust generalization of the segmentation models.Subsequently, to fine-tune the model hyperparameters, the training subset underwent further random splitting into training (80%) and validation (20%) subsets.This approach aimed to facilitate the refinement of model performance by iteratively adjusting the hyperparameters based on the validation set, while preserving the independence of the testing set for the final evaluation of model generalization (Table 1 and Figs. 2, 3 and 4).

Preprocessing
Because the pathology images were represented in an RGB data structure, the pixel values of the images ranged from 0 to 255.The pixels were scaled to a range between zero and one to avoid gradient explosions during the training phase.The patch images were resized to 512 × 512 pixels before being fed into the deep-learning model for segmentation.Three different augmentation methods were used to address overfitting resulting from a limited number of samples: horizontal flipping, rotation, and brightness adjustment.The third augmentation method was used because of varying degrees of slide brightness.Although we performed PAS staining for all histological slides using the same protocol, the degree of staining and, consequently, the overall brightness of the specimen

Proposed model framework
In this study, we proposed to use DeepLabV3 16 , which is a two-stage segmentation framework for the segmentation task.The architecture of the DeepLabV3 encoder consists of Atrous Spatial Pyramid Pooling (ASPP) blocks that allow it to maintain the Field-of-View (FOV) of the network layers and effectively capture contextual information at different scales.Moreover, DeepLabV3 uses dilated or "-atrous" convolution layers to maintain high-precision predictions while maintaining a wide FOV.This is particularly critical for histopathological imaging because of the fine-grained structures and textures.In addition, the dense structure of the images leads to an extreme foreground-background class-imbalance phenomenon.To overcome this challenge, we integrated an objective function, which is the summation of the Dice Loss 17 and Focal Loss 18 functions.Unlike classification tasks, the outputs of segmentation problems are continuous, rather than categorical.Thus, Dice Loss is particularly suitable for continuous maps because it measures the overlap between a prediction and target.Furthermore, Dice Loss is independent of the statistical distribution of labels and penalizes misclassifications based on the overlap between the predicted regions and ground truths.The last part of our object function is the Focal Loss function, which was used in the RetinaNet 18 deep-learning model to mitigate the class-imbalance problem in dense object detection.Furthermore, we integrated DeepLabV3 with a MobileNet backbone designed for mobile and embedded devices such that the developed model can be applied to devices that might have limited computational resources in clinical environments.
As presented in Table 2, our datasets were imbalanced, with the number of annotations for the Glomerulus class being relatively small compared to the other classes.To address this issue, the objective function assigns a higher weight to examples in the minority class, and a lower weight to those in the majority class.Mathematically, the objective function can be described by the following equation: (1) where y , p , and γ correspond to the ground truth, model prediction, and the parameter that controls the degree of focus on the difficulty of the examples, respectively.If γ is set to 0, the Focal Loss is reduced to the standard cross-entropy loss.The proposed model was implemented using PyTorch 19 , and the loss function was obtained from the MONAI library 19,20 .The training procedure took approximately 4 h on a graphics processing unit (GPU) RTX 3090 24 GB.

Data analyses
Network performance was quantitatively assessed using instance-level DICE and IoU scores.In image segmentation, the DICE and IoU are commonly used to evaluate the performance of segmentation algorithms.
They measured the similarity between the predicted segmentation mask and ground-truth mask.While DICE measures the ratio of the intersection of the two masks to the sum of their areas, the IoU metric calculates the overlap between the predictions and human masks by taking the ratio of their intersection to their union.In addition, sensitivity, specificity, and accuracy were calculated.In this study, we used these metrics to evaluate the performance of the proposed system comprehensively.

Comparison with other model
In our comprehensive comparative analysis, we used U-Net 21 and SegFormer 22 , two widely used neural network architectures.U-Net, a widely used convolutional neural network architecture for semantic segmentation, features a distinctive U-shaped design comprising the contracting, bottleneck, and expansive paths.It excels at capturing intricate spatial features and is known for its success in medical image segmentation tasks.SegFormer, a stateof-the-art algorithm for segmentation, adopts a transformer-based architecture 23 with lightweight multilayer perception.It demonstrates an extremely high level of performance on the Cityscapes 24 dataset, highlighting its effectiveness in diverse computer vision applications.We applied the standard architectures of U-Net and SegFormer without modification and used the same training, validation, and test subsets as in our model.The DICE and IoU values of U-Net and SegFormer were measured for comparison.

Statistical analyses
We used One-way ANOVA (or t-tests) for comparison between deepLabV3, UNet and Segformer by comparing respective Dice and IoU coeffecienct.P < 0.05 was considered statistically significant.

Model parameter optimization
We trained the model using the following hyperparameters: a learning rate of 0.5, batch size of 32, 60 epochs, and γ of 2. We evaluated the performance of each combination of hyperparameters using a held-out validation dataset.We found that the learning rate had a significant impact on model performance, with higher learning rates leading to faster convergence but a lower Dice coefficient (DICE) and Intersection over Union (IoU).In contrast, a lower learning rate results in overfitting.The batch size had a less pronounced effect, with a larger batch size generally resulting in faster convergence and improved validation performance.In addition to learning rate and batch size, we discovered that γ of Focal Loss was very sensitive to the performance of the model.A small value led to overfitting of the majority classes, whereas a large value resulted in poor performance in the training dataset.

Performance of segmentation model
The effectiveness of the proposed segmentation model for each class is summarized in

Comparison with other studies
We compared our model with existing state-of-the-art methods (U-Net and SegFormer) for histopathological assessment of renal tubular injury.Table 3 presents a comparison between the performances of the three models for the testing subset.Our model (DeepLabV3) exhibited a comparable or slightly better performance than SegFormer.The performance of the proposed model was better than that of U-Net, particularly in segmenting necrotic tubules and tubules with cast.

Discussion
Over the last decade, numerous studies have focused on the development of deep-learning models for nephropathology.In several previous studies, neural networks have been trained and successfully applied to specific glomerular segmentation tasks, such as distinguishing between glomerular and non-glomerular regions and classifying healthy and injured glomeruli in WSIs of both human disease and animal models [25][26][27] .In 2020, Uchino et al. developed a comprehensive deep-learning model to classify multiple glomerular images and suggested its potential use in enhancing the diagnostic accuracy for clinicians 28 .
The initial results of the multiclass segmentation task for kidneys were reported in 2018 29 .They proposed a method for renal segmentation of PAS-stained digital slides of renal allograft resections using CNNs for nine classes, including five healthy structures (glomerulus, distal tubules, proximal tubules, arterioles, and capillaries) and four pathological structures (atrophic tubules, sclerotic glomeruli, fibrotic tissue, and inflammatory infiltrates).Three different network architectures were used to perform this task: a fully convolutional network, U-net, and a multiscale fully convolutional network.
Another CNN for the multiclass segmentation of kidney sections with PAS staining was developed by Hermsen et al. 30 .Dice coefficients were used to assess the segmentation performance for ten classes (glomerulus, Table 3.Comparison of testing performance between our model (DeepLabV3), Segformer, and U-Net.

Glomerulus
Healthy To the best of our knowledge, there have been a limited number of reports on segmentation models for identifying injured tubules in patients with acute kidney injury.Our study presents a deep learning-based segmentation model for evaluating acute renal tubular injury in digitized PAS-stained images.We applied deep-learning models to identify the typical structural types of toxicity-induced acute tubular injuries, including glomeruli, healthy tubules, necrotic tubules, and tubules with casts.The DICE scores and IoU showed high and consistent performances in the segmentation of these regions.Notably, the performance of the proposed model was the highest for the glomerulus despite the glomerulus class having the smallest number of annotations.This suggests that the performance of the model can be improved further by adding more training data, particularly for the glomerulus class.Overall, the results suggest that the proposed segmentation model has the potential to be used in clinical applications for the accurate identification and segmentation of different kidney structures, particularly injured tubules.In future, we intend to translate the technique developed in this study to a human biopsy dataset.As a dissociation exists between histopathological findings and the clinical symptoms of AKI in some cases (such as volume depletion-induced AKI in allergic, cardiogenic, or hemorrhagic shock), renal biopsy may assist in assessing structural injury, differentiating the cause of AKI, and aiding in treatment 1 .
The proposed approach exhibited a similar or slightly higher performance than the state-of-the-art models.The mean DICE values for SegFormer and U-Net were 81.49% (ranging from 75.69 to 86.69%) and 70.27% (ranging from 53.66 to 82.18%), respectively, across the four classes, whereas our model yielded a mean DICE of 87.71% (ranging from 83.64 to 91.78%).The mean IoUs for SegFormer and U-Net were 69.97% (ranging from 61.55 to 76.77%) and 62.48% (ranging from 53.41 to 72.78%) across the four classes, respectively, whereas our model had a mean IoU of 79.68% (ranging from 75.49 to 86.09%).Therefore, compared with previously used methods for assessing renal tubular injury, the method proposed in this study may be effective for identifying injured renal tubules in acute kidney injury in terms of segmentation performance and computational complexity.It is noteworthy that our model exhibited a comparable or slightly better performance than Segformer, with significantly simpler computational complexity.SegFormer produced results with a high degree of parameter counts of 64 million, whereas our model, DeepLabV3, based on Mobile-net, presented relatively high efficiency with only 11 million parameter counts.This efficiency underscores the potential practical advantages of our model in terms of computational resources and model complexity.
Our study has some limitations.First, a deep-learning model was developed to evaluate the histological images of murine cisplatin-induced acute tubular injury.Although the histological structures of the mouse and human kidneys are similar, the distance or connective tissue area among the structures in the mouse kidney tissue is relatively small compared to that in humans.These closely located structures make it more difficult to distinguish the boundaries between them, particularly in necrotic areas where the basement membranes are occasionally not intact.Second, the number of WSIs and patches generated in this study was limited.A study that includes a larger number of annotations is underway and is expected to achieve higher performance in training the model.Third, when substances such as casts are present in the injured tubular lumen, the effectiveness of measuring the degree of tubular injury decreases.

Conclusion
The deep-learning segmentation model developed in this study can accurately identify the histopathological structures of injured renal tubules.The results serve as the basis for future studies with larger datasets, including mouse and human biopsy samples, which can provide new opportunities for applying the proposed methods to renal pathology.
Figure1.A-B Whole slide image of H & E (A) and PAS (B)-stained kidney section was digitalized using slide scanner at 40× magnification.Randomly generated patch without annotations.B H&E and PAS staining images of healthy tubules, necrotic tubules, and tubules with casts after cisplatin administration.C Randomly generated patch with annotations comprised four different structures: "glomerulus, " "healthy tubules, " "necrotic tubules, " and "tubule with cast".

Figure 2 .
Figure 2. Representative PAS-stained images, ground truth mask and predicted mask generated by the CNNs in training set.

Figure 3 .
Figure 3. Representative PAS-stained images, ground truth mask and predicted mask generated by the CNNs in validation set.

Figure 4 .
Figure 4. Representative PAS-stained images, ground-truth masks, and predicted masks generated by CNNs in test set.

Table 1 .
The number of annotations in each class used in training and test set for segmentation model.

Table 2 .
Quantitative segmentation performance of four classes in the actue tublar injury images in training, validation and testing sets.

tubules Necrotic tubules Tubules and cast
32lerotic glomerulus, empty Bowman's capsules, proximal tubules, distal tubules, atrophic tubules, undefined tubules, arteries, interstitium, and capsule) of nephrectomy and transplant biopsy specimens.In both datasets, the glomerulus was the best-segmented class (Dice coefficients of 0.95 and 0.94)30.Recently, Bouteldja et al. published high-performance deep-learning algorithms for the multiclass segmentation of kidney histology for various diseases in mouse models and other species.In this study, six annotated structures were used: tubules, full glomerulus, glomerular tuft, artery, arterial lumen, and vein31.Although previous studies have focused on developing models for segmenting renal tubular structures, the predefined classes of tubules included only normal tubular types, such as proximal and distal tubules, or abnormal tubular types, such as atrophic tubular structures, in a renal fibrosis model32.